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^ Abstract 

We derive rates of contraction of posterior distributions on nonparametric models 
resulting from sieve priors. The aim of the paper is to provide general conditions to get 
posterior rates when the parameter space has a general structure, and rate adaptation 
when the parameter space is, e.g., a Sobolev class. The conditions employed, although 
CO standard in the literature, are combined in a different way. The results are applied 

to density, regression, nonlinear autoregression and Gaussian white noise models. In 
the latter we have also considered a loss function which is different from the usual 

o 

norm, namely the pointwise loss. In this case it is possible to prove that the adaptive 
Bayesian approach for the loss is strongly suboptimal and we provide a lower bound 
^ on the rate. 
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1 Introduction 

The asymptotic behaviour of posterior distributions in nonparametric models 
has received growing consideration in the literature over the last ten years. 
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Many different models have been considered, ranging from the problem of den- 



sity estimation in i.i.d. models (Barron et al. 1999 Ghosal et al. 20001, to 



sophisticated dependent models (Rousseau et al. 20121. For these models, dif- 



ferent families of priors have also been considered, where the most common are 



Dirichlet process mixtures (or related priors), Gaussian processes (van der Vaart 



and van Zanten 


2008 




Abramovich et al. 


19981 



In this paper we focus on a family of priors called sieve priors, introduced as 



Shen and Wasserman 



compound priors and discussed by Zhao (1993 20001, and further studied by 



(200l|. It is defined for models P^"^ : 6 G Q), 



n e N\{0}, where & C K^, the set of sequences. Let A be a a-field associated 
to Q. The observations are denoted X", where the asymptotics are driven by 
n. The probability measures Pg"'' are dominated by some reference measure /i, 
with density Pg"^ Remark that such an infinite-dimensional parameter 6 can 
often characterize a functional parameter, or a curve, / = fg. For instance, in 
regression, density or spectral density models, / represents a regression func- 
tion, a log density or a log spectral density respectively, and represents its 
coordinates in an appropriate basis ^p = {ilJj)j>i {e.g., a Fourier, a wavelet, a log 
spline, or an orthonormal basis in general). In this paper we study frequentist 
properties of the posterior distributions as n tends to infinity, assuming that 
data X"- are generated by a measure Pg^\ G 6- We study in particular rates 
of contraction of the posterior distribution and rates of convergence of the risk. 

A sieve prior 11 is expressed as 



0^n(.) = ^7r(fc)n,(.), 



(1) 



fc=i 



where J2k^i^) = ^' '^{k) > 0, and the life's are prior distributions on so-called 
sieve spaces &k = M'^. Set 9k — {9i, ... ,9k) the finite-dimensional vector of 
the first k entries of 6. Essentially, the whole prior 11 is seen as a hierarchical 
prior, see Figure [T] The hierarchical parameter k, called threshold parameter, 
has prior tt. Conditionally on k, the prior on is life which is supposed to have 
mass only on Gfe (this amounts to say that the priors on the remaining entries 
6j, j > k, are point masses at 0). We assume that life is an independent prior on 
the coordinates 9j, j = 1, . . . ,k, of 9^ with a unique probability density g once 
rescaled by positive t = {Tj)j>i. Using the same notation life for probability 
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TT 











Fig. 1: Graphical representation of the hierarchical structure of the sieve prior 
given by Equation Q 



and density with Lebesgue measure or M , we have 

k 



(2) 



Note that the quantities H, 11^, tt, t and g could depend on n. Although not 
purely Bayesian, data dependent priors are quite common in the literature. For 
instance, Ghosal and van der Vaart (2007) use a similar prior with a determin- 



istic cutoff k — [n^/(^°'+^) \ in application 7.6. 

We will also consider the case where the prior is truncated to an ball of radius 
Ti > (see the nonlinear AR(1) model application in Section 2.3.31 



k 

yOkeOk, nfc(0fe)(x[]- 



(3) 



The posterior distribution 11 ( • \X") is defined by, for all measurable sets B of 

n(i.|x-),-''^f''-^"'-'""". (4) 

Given the sieve prior 11, we study the rate of contraction of the posterior distri- 
bution in Pg"' —probability with respect to a semimetric dn on Q. This rate is 
defined as the best possible (i.e. the smallest) sequence (e„)„>i such that 

u(e : dl{e,eo)>Mel\x'') 0, 



n— fcjo 



in f'g"^ probability, for some 9o (z Q and a positive constant M, which can be 
chosen as large as needed. We also derive convergence rates for the posterior 
loss n(d2(6/,0o)|X") in P^"^ -probability. 
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The posterior concentration rate is optimal when it coincides with the minimax 
rates of convergence, when 9q belongs to a given functional class, associated to 
the same semimetric (i„. Typically these minimax rates of convergence are de- 
fined for functional classes indexed by a smoothness parameter Sobolev, Holder, 
or more generally Besov spaces. 

The objective of this paper is to find mild generic assumptions on the sieve prior 
n of the form on models Pg"-* and on (i„, such that the procedure adapts to 
the optimal rate in the minimax sense, both for the posterior distribution and 
for the risk. Results in Bayesian nonparametrics literature about contraction 
rates are usually of two kinds. Firstly, general assumptions on priors and models 



allow to derive rates, see for example Shen and Wasserman (2001 1; Ghosal et al. 



(20001; Ghosal and van der Vaart (2007). Secondly, other papers focus on 
a particular prior and obtain contraction rates in a particular model, see for 
instance Belitser and Ghosal ( 2003 1 in the white noise model, De Jonge and 



van Zanten ( 2010 1 in regression, and Scricciolo ( 2006 1 in density. The novelty 



of this paper is that our results hold for a family of priors (sieve priors) without 
a specific underlying model, and can be applied to different models. 

An additional interesting property that is sought at the same time as conver- 
gence rates is adaptation. This means that, once specified a loss function (a 
semimetric c?„ on Q), and a collection of classes of different smoothnesses for 
the parameter, one constructs a procedure which is independent of the smooth- 
ness, but which is rate optimal (under the given loss d„), within each class. 
Indeed, the optimal rate naturally depends on the smoothness of the param- 
eter, and standard straightforward estimation techniques usually use it as an 
input. This is all the more an important issue that relatively few instances in 
the Bayesian literature are available in this area. That property is often ob- 
tained when the unknown parameter is assumed to belong to a discrete set. 



see for example Belitser and Ghosal (20031. There exist some results in the 



context of density estimation by Huang' (2004 1 , Scricciolo (20061, Ghosal et al. 



(20081, van der Vaart and van Zanten, ( j2009j ) , Rivoirard and Rousseau (2012al, 



Rousseau (20101 and Kruijer et al. (20101, in regression by De Jonge and van 



Rousseau 


( 


20101 


Zanten 


(20101, 



Zanten (20101, and in spectral density estimation by Rousseau and Kruijer 



(20111. What enables adaptation in our results is the thresholding induced by 
the prior on k: the posterior distribution of parameter k concentrates around 
values that are the typical efficient size of models of the true smoothness. 



As seen from our assumptions in Section 2.1 and from the general results (The- 
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orem [T] and Corollary [T]) , adaptation is relatively straightforward under sieve 
priors defined by Q when the semimetric is a global loss function which acts 
like the KuUback-Leibler divergence, the P norm on 6 in the regression prob- 
lem, or the Hellinger distance in the density problem. If the loss function (or 
the semimetric) d„ acts differently, then the posterior distribution (or the risk) 
can be quite different (suboptimal). This is illustrated in Section 3.2 for the 
white noise model ( [T6| when the loss is a local loss function as in the case of 
the estimation of f{t), for a given t, where dn{f,fo) = {f{t) — fo{t))^- This 
phenomenon has been encountered also by Rousseau and Kruijer ( |2011[ ). It is 
not merely a Bayesian issue: Cai et al. (20071 show that an optimal estimator 
under global loss cannot be locally optimal at each point f{t) in the white noise 
model. The penalty between global and local rates is at least a logn term. 



Abramovich et al. ( 2004 1 and Abramovich et al. ( 2007a I obtain similar results 



with Bayesian wavelet estimators in the same model. 

The paper is organized as follows. Section [2] first provides a general result 
on rates of contraction for the posterior distribution in the setting of sieve 
priors. We also derive a result in terms of posterior loss, and show that the 
rates are adaptive optimal for Sobolev smoothness classes. The section ends 
up with applications to the density, the regression function and the nonlinear 
autoregression function estimation. In Section [3] we study more precisely the 
case of the white noise model, which is a benchmark model. We study in detail 
the difference between global or pointwise losses in this model, and provide a 
lower bound for the latter loss, showing that sieve priors lead to suboptimal 
contraction rates. Proofs are deferred to the Appendix. 



Notations 

We use the following notations. Vectors are written in bold letters, for example 
9 or Oq, while light-face is used for their entries, like 6j or 6oj. We denote 
by ^ofc the projection of Oq on its first k coordinates, and by p^J^^ and Pq"-* 
the densities of the observations in the corresponding models. We denote by 
dn a semimetric, by || • ||2 the P norm (on vectors) in or the norm (on 
curves /), and by || • ||2,fc the P norm restricted to the first k coordinates of a 
parameter. Expectations Eq"-* and Eg"-* are defined with respect to Pg^^ and 
Pg"^ respectively. The same notation 11 ( • |X") is used for posterior probability 
or posterior expectation. The expected posterior risk and the frequentist risk 
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relative to dn are defined and denoted by 7ef;'(6»o) = El^^U{dl{B,eo)\X") and 
R'^{9o) = Eg"^(d^(0, 0o)) respectively (for an estimator of 6q), where the 



mention of 6^ might be omitted (c/. Robert 2007 Section 2.3). We denote by 
(p the standard Gaussian probability density. 

Let K denote the KuUback-Leibler divergence K{f,g) — J f\og{f/g)dfi, and 
Vm,o denote the m"' centered moment 14i,o(/,5) = / /I log(//g) - K{f,g)\'^ 
dfi, with m > 2. 

Define two additional divergences K and Vmft, which are expectations with 



respect top^"\ K{f,g) = J p\^'\log{f /g)\dn and V,n,o{f,g) = J Po \^og{f /g)- 
Kif,g)rdfi. 

We denote by C a generic constant whose value is of no importance and we use 
< for inequalities up to a multiple constant. 



2 General case 



In this section we give a general theorem which provides an upper bound on 
posterior contraction rates e„. Throughout the section, we assume that the 
sequence of positive numbers (en)„>i, or (en(/3))„>i when we point to a specific 



value of smoothness /3, is such that e„ 
We introduce the following numbers 



and ne^/logn — > oo. 



in = ljonel/\og{n)\, k,, = [Moj„ log(n)/L(n)J , 



(5) 



for jo > 0, Mq > 1, where L is a slow varying function such that L < log, hence 
jn < kn- We use fc„ to define the following approximation subsets of 9 



\e\ 



2,k, 



< n 



for Q > 0. Note that the prior actually charges a union of spaces of dimension 
k, k > 1, so that Qk„{Q) can be seen as a union of spaces of dimension k < kn- 
Lemma [2] provides an upper bound on the prior mass of Ok^iQ)- 



It has been shown ( Ghosal et al. 2000 Ghosal and van der Vaart 2007 Shen and 



Wasserman 2001 1 that an efficient way to derive rates of contraction of posterior 



distributions is to bound from above the numerator of Q using tests (and kn 
for the increasing sequence Qk^{Q)), and to bound from below its denominator 
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using an approximation of Pq based on a value 9 G Qj^^ close to 6. The latter is 
done in Lemma |3] where we use j„ to define the finite component approximation 
doj^ of 00 1 Eind we show that the prior mass of the following KuUback-Leibler 
neighbourhoods of 0o, I3n{m), n e N*, are lower bounded by an exponential 
term: 

e„(m) = {e:K < 2nelV^.o < 2^+\nelr/'} . 

Define two neighbourhoods of 6q in the sieve space Qj^ , Bn{m), similar to S„ (m) 
but using K and Vm,Q, and An{Hi), an P ball of radius n~^^, Hi > 0: 

S„(m) = {e e e,„ : if (l^t^) < nel, V„..o (pitPs^) < , 

2.1 Assumptions 

The following technical assumptions are involved in the subsequent analysis, 
and are discussed at the end of this section. Recall that the true parameter is 
6q , under which the observations have density Pq"^ . 

(n) 

Ai Condition on and e„. For n large enough and for some m > 0, 
K{pl;'\pit)<^^l and K«,o(p^"^p£)<KO™^^ 

A2 Comparison between norms. The following inclusion holds in 8j„ 
3Hi > 0, s.t. An{Hi) C Bnim). 

As Comparison between dn and P. There exist three non negative constants 
Do,Di,D2 such that, for any two 0,6' e Qk^iQ), 

dr,{e,e')<Dok^'\\e-9'\\^l . 



A4 Test Condition. There exist two positive constants ci and ( < I such 
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that, for every 6i E &k„{Q), there exists a test 0„(0i) G [0, 1] which satisfies 

E^^\MSi)) < e-^i"'^'(^°'^i) and 

sup E^"^ (1 - M^i)) < e-^''<'^'^°''^'\ 

d„(e,ei)<cd„(eo.ei) 



As On the prior H. There exist positive constants a, 6, Gi, G2, G3, G4, a 
and To such that tt satisfy 

Vfc = l,2,..., e-'^'^-^^'^) < 7r(A:) < e-'''=^('=\ (6) 

where the function L is a slow varying function introduced in Equation ([s]); g 

satisfy 

VeeR, Gie^^^^l^l" <5(6i) < Gse-'^-'l^l". (7) 

The scales r defined in Equation ([2| satisfy the following conditions 

maxTj < To, (8) 
minTj >n^"\ (9) 

j<kn 

£isr/r;<cjniogn. (10) 

Remark 1. 



• Conditions Ai and A2 are local in that they need to be checked at the 
true parameter 6q only. They are useful to prove that the prior puts 
sufficient mass around KuUback-Leibler neighbourhoods of the true prob- 
ability. Condition Ai is a limiting factor to the rate: it characterizes 
e„ through the capacity of approximation of pg"' by pg"-* : the smoother 
Pg"\ the closer pg"'' and Pg"^, and the faster e„. In many models, they 
are ensured because K{p^^^\pg^ ) and Vrn.fl{p^()'\p^g^ ) can be written lo- 
cally (meaning around ©o) in terms of the P norm \\0q — 0j„||2 directly. 
Smoothness assumptions are then typically required to control \\OQ — Oj^ II2. 

It is the case for instance for Sobolev and Besov smoothnesses (c/. Equa- 
tion ( |12[ )). The control is expressed with a power of j„, whose comparison 
to provides in turn a tight way to tune the rate (c/. the proof of 
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Proposition [TJ . 

Note that the constant Hi in Condition A-2 can be chosen as large as 
needed: if A2 holds for a specified positive constant Hq, then it does for 
any Hi > ffp- This makes the condition quite loose. A more stringent 
version of A2, if simpler, is the following. 
A'^ Comparison between norms. For any e <dj^ 



eWl . and 



2j„ 



This is satisfied in the Gaussian white noise model (see Section [3|. 

• Condition A3 is generally mild. The reverse is more stringent since d„ 
may be bounded, as is the case with the Hellinger distance. A3 is satisfied 
in many common situations, see for example the applications later on. 
Technically, this condition allows to switch from a covering number (or 
entropy) in terms of the P norm to a covering number in terms of the 
semimetric dn- 

• Condition A4 is common in the Bayesian nonparametric literature. A 
review of different models and their corresponding tests is given in |Ghosal| 



and van der Vaart (20071 for example. The tests strongly depend on the 



semimetric dn- 

Condition A5 concerns the prior. Equations ([6| and (|7| state that the 
tails of TT and g have to be at least exponential or of exponential type. 
For instance, if tt is the geometric distribution, L — 1, and if it is the 
Poisson distribution, L{k) = log(A:) (both are slow varying functions). 
Laplace and Gaussian distributions are covered by g, with a — 1 and 
a = 2 respectively. These equations aim at controlling the prior mass of 
Qfc {Q)i the complement of Oa;„((5) in Q (see Lemma [2j). The case where 



the scale r depends on n is considered in Babenko and Belitser (2009 



2010 1 in the white noise model. Here the constraints on r are rather mild 



since they are allowed to go to zero polynomially as a function of n, and 



must be upper bounded. Rivoirard and Rousseau (2012a I study a family 



of scales t = {'^i)j>i that are decreasing polynomially with j. Here the 
prior is more general and encompasses both frameworks. Equations ([6]) - 
(10 1 are needed in Lemmas [2] and [s] for bounding respectively n(;B„(m)) 
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from below and n(8^ (Q)) from above. A smoothness assumption on 9o 



is usually required for Equation ( 10 ) 



2.2 Results 

2.2.1 Concentration and posterior loss 

The following theorem provides an upper bound for the rate of contraction of 
the posterior distribution. 

Theorem 1. // Conditions Ai - hold, then for M large enough and for L 
introduced in Equation 

(")tt fa ■ ^2/o a'\ A/r^Qg"-, 



Proof. See the Appendix. □ 

The convergence of the posterior distribution at the rate e„ implies that the 
expected posterior risk converges (at least) at the same rate e„, when (i„ is 
bounded. 

Corollary 1. Under the assumptions of Theorem^ with a value of m in Con- 
ditions Ax and A2 such that (ne^)^™/^ — 0{e^), and if d„ is bounded on Q, 
then the expected posterior risk given Oq and H converges at least at the same 
rate e„ 

Te;^" =E(")n(d^(0,0o)|x") = (^e^) . 

Proof Denote D the bound of d„, i.e. for all 6, 6' £ 6, d„(0,0') < D. We 
have 

rl0g".2 ^„in)r,f, (,2,a a^~^ .M"^ A ^2 , 



so 7?.fj" — O {log n / L{n)e^) by Theorem [l] and the assumption on m. □ 

Remark 2. The condition on rn in Corollary [l] requires nef^ to grow as a power 
of n. When Oq has Sobolev smoothness (3, this is the case since is typically of 
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order {n/ \ogn)~'^^^^'^^'^^\ The condition on m boils down to to > 4/3. When 
00 is smoother, e.g. in a Sobolev space with exponential weights, the rate 
is typically of order logn/-y/n. Then a common way to proceed is to resort 
to an exponential inequality for controlling the denominator of the posterior 
distribution of Equation Q (see e.g. Rivoirard and Rousseau 2012b I. 

Remark 3. We can note that this result is meaningful from a non Bayesian point 
of view as well. Indeed, let 6 be the posterior mean estimate of 9 with respect 
to n. Then, if 9 ^ {9,9q) is convex, we have by Jensen's inequality 

d^(g,0o)<n(d2(0,0o)|x"), 



so the frequentist risk converges at the same rate e„ 



r(") 



logn 



Note that we have no result for general pointwise estimates 0, for instance for 



the MAP. This latter was studied in Abramovich et al. (2007b 20101. 



2.2.2 Adaptation 

When considering a given class of smoothness for the parameter 0o, the min- 
imax criterion implies an optimal rate of convergence. Posterior (resp. risk) 
adaptation means that the posterior distribution (resp. the risk) concentrates 
at the optimal rate for a class of possible smoothness values. 

We consider here Sobolev classes 0^(Lo) for univariate problems defined by 

e^io) ■ < Loj , /? > 1/2, Lo > (11) 

with smoothness parameter /3 and radius Lq- If G 9^(Lo), then one has the 
following bound 

oo 

¥.-e.,X= ^lf'r^'<^^r.''- (12) 



Donoho and Johnstone (19981 give the global (i.e. under the loss) minimax 
rate 7i~'5/(2^+i) attached to the Sobolev class of smoothness /3. We show that 
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under an additional condition between K, T^m.o s^nd the upper bound e„ on 
the rate of contraction can be chosen equal to the optimal rate, up to a logn 
term. 

Proposition 1. Let Lq denote a positive fixed radius, and /32 ^ /?! > 1/2. // 

for n large enough, there exists a positive constant Cq such that 



sup sup K (po"\po"^ ) < Cqh \\Bo- 9ojJ\l , and 
/3i</3</32 eoeef,(Lo) ^ ^ 

sup Vm.O ) < Co™n"/2 11^^ _ g^^^ ||m ^ (^3) 



sup 

Pi<P<p2 eoeep{Lo) 

and if Conditions A2 - A5 hold with constants independent of 6q in the set 
U/3j</3<028^(Lo), then for M sufficiently large, 

sup sup E^,"^u(e:dl{9,9o)>M^^el{l3)\xA^0, 

with 

and eo depending on Lq^Cq and the constants involved in the assumptions, but 
not depending on j3. 

Remark 4. In the standard case where (i„ is the P norm, e„ is the optimal 
rate of contraction, up to a log?! term (which is quite common in Bayesian 
nonparametric computations) . 



/3 

log n \ ^''+^ 



Proof. Let (3 e [/?i,/32] and 6q e Qj3{La). Then 6^ satisfies Equation (12l, and 



Condition (|13| implies that 
K 



For given Oq and /3, the result of Theorem [T] holds if Condition is satisfied. 
This is the case if we choose e„(/3,0o) ^ GqLqJ^^, provided that the bounds in 



Conditions A2 - A5 and in Equation (13 1 are uniform. Combined with j 



[jone^/ lognj , it gives as a tight choice e„(/3,0o) = eo{(3,6o){\ogn/n)^^'^'^'^'^^^ 
with 60(13,60) < {LoCojg'^y/^'^'^+^K So there exists a bound eo > such that 
^^P/3i</3</32 ^^PeoeBniLo) ^o(/?, ^0) = eo < 00, which concludes the proof. □ 
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2.3 Examples 

In this section, we apply our results of contraction of Sections |2.2.1| and |2.2.2| 
to a series of models. The Gaussian white noise example is studied in detail 
m Section [3) We suppose in each model that Oq e 9^(Lo); where 8^(Lo) is 



defined in Equation (111 



Throughout, we consider the following prior 11 on (or on a curve space 
through the coefficients of the functions in a basis) . Let the prior distribution vr 
on k be Poisson with parameter A, and given k, the prior distribution on Oj/tj, 
J = 1, . . . , A: be standard Gaussian, 

k ^ Poisson(A), 



I fc - A/'(0, 1), j = 1, . . . , k, independently. (14) 



r. 



It satisfies Equation (|6| with function L{k) = log(fc) and Equation ([t]) with 
a = 2. Choose then rj — tqj'^^^, tq > 0, with q > 1/2. It is decreasing and 
bounded from above by tq so Equation ([8| is satisfied. Additionally, 



min Tj — Tfe^ — k^'^'^ > n 



for H2 large enough, so Equation ^ is checked. Since Oq £ 8^(Lo), 

i=i j=i j=i j=i 



as soon as 2(7 — 2/3 < 1. Hence by choosing 1/2 < g < 1, Equation ( 10 1 is verified 
for all /3 > 1/2. The prior 11 thus satisfies Condition A5. 

Since Condition A5 is satisfied, we will show in the three examples that Condi- 
tions A2 - A4, and Condition ( 13 1 hold, thus Proposition [l] applies: the posterior 



distribution attains the optimal rate of contraction, up to a logrt term, that is 
— ioi^ogn/n)^^^^^'^^\ for a distance c?„ which is specific to each model. This 
rate is adaptive in a range of smoothness [/3i,/32]. 
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2.3.1 Density 



Let us consider the density model in which the density p is unknown, and we 
observe i.i.d. data 

X, -p, i = l,2,...,n, 

where p belongs to J^, 

J" = {p density on [0, 1] : p(0) = and logp S L^(0, 1)} . 

Equality p(0) = is mainly used for ease of computation. We define the 
parameter 9 of such a function p, and write p — Pg, as the coefficients of \ogpg 
in the Fourier basis xp = {'4'j)j>i, i.e. it can be represented as 



oo 

^ogPeix) = ^6ljVj(a;) - c(6»), 



where c{d) is a normalizing constant. We assign a prior to by assigning the 



sieve prior 11 of Equation ( 14 1 to 



A natural choice of metric dn in this model is the Hel linger distance dnjd, O') = 

{y/Po ~ VPe') ^A^j ■ Lemma 2 in Ghosal and van der Vaart 
( |2007| ) shows the existence of tests satisfying A4, with the Hellinger distance. 



Rivoirard and Rousseau (2012b I study this model in detail (Section 4.2.2) in 
order to derive a Bernstein-von Mises theorem for the density model. They 



prove that Conditions A2, As and (13 1 are valid in this model (see the proof of 
Coi 



Condition (C) for A2 and (13), and the proof of Condition (B) for A3). With 
D2 = 1; Condition A3 is written h{pg,pgi) < Dokn ||^ ^ ^'||2 fc • 



2.3.2 Regression 



Consider now the following nonparametric regression model 

Xt^f{U) + aCi, i = l,...,n, 

with the regular fixed design ti = i/n in [0, 1], i.i.d. centered Gaussian errors 
with variance a^. The unknown a case is studied in an unpublished paper by 
Rousseau and Sun. They endow a with an Inverse Gamma (conjugate) prior. 
They show that this one dimensional parameter adds an n log((T/(To) term in the 
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Kullback-Leibler divergence but does not alter the rates by considering three 
different cases for ct, either a < (Jo/2, a > 3<Tq/2, or cr e [a-o/2, 3cto/2]. 

We consider now in more detail the a known case. Denote 9 the coefficients of 
a regression function / in the Fourier basis if) = {tpj)j>i. So for all t € [0, 1], / 
can be represented as f{t) = J2fLi We assign a prior to / by assigning 

the sieve prior 11 of Equation (14) to 0. 

Let P" = jyi^i^ti be the empirical measure of the covariates ti's, and 
define the square of the empirical norm by H/Hpn — n^^ S"=i f^i^i)- We use 
(^71 — II ■ ||p;'- 

Let E Q and / the corresponding regression. Basic algebra (see for example 
Lemma 1.7 in Tsybakov 20091 provides, for any two j and k, 



1 " 



n 

1=1 



where 5jk stands for Kronecker delta. Hence 



1 " 



n 

1=1 j,k 



where the last equality is Parseval's. It ensures Condition A3 with — D2 — 1 
and Di = 0. 



The densities J\f{f{ti),a'^) of X^'s are denoted pf.i, i — l,...,n, and their 
product p"^^ . The quantity /, 
jn terms in the Fourier basis. 



product p"^^ . The quantity /oj„ denotes the truncated version of /q to its first 



We have 2K{pf^^„pf.,) = V2,oipf^,^,Pf,^) = ^ {faiU)~f{U)y and VmAPfo 

^m-2| f f(4.\\2 



Pf,i) — ^m^"^~'^\f ai^i) — for m > 2, where is the (non centered) 

m*''— moment of a standard Gaussian variable. So using Equation (15) we get 

2K{p^;^J;^) = V2,o{p^;^J;^) = na-^f, /||^„ = na-^0o 0\\l 



which proves Condition ( 13 1 . 

Additionally, both 2K{p^^\ ,P^"'') and V2,o(P/"''. ^P^f^) ^'^'^ upper bounded by 
na-2(2||/o^-^ - /||2,. + 11/7- /ojJ|^„). Let 9 &An{Hi), for a certain Hi > 0. 
Then, using ( 15 1 again, the bound is less than 

na-\n-''' + Loj-^^) < Cnel 
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for Hi > 2f3/{2(3 + 1), which ensures Condition A2. 



Ghosal and van der Vaart (2007) state in Section 7.7 that tests satisfying A4 



exist with d„ = |1 • \\p^. 

2.3.3 Nonlinear AR(1) model 

As a nonindependent illustration, we consider the following Markov chain: the 
nonlinear autoregression model whose observations X" = (Xi,...,X„) come 
from a stationary time series Xt,t £ Z. such that 

X,^ f{X,_i)+i„ 1^1,2,. ..,n, 

where the function / is unknown and the residuals are standard Gaussian and 
independent of (Xi, . . . , Xi^i). We suppose that Xq is drawn in the stationary 
distribution. 

Suppose that regression functions / are in L2{^), and uniformly bounded by a 
constant Mi (a bound growing with n could also be considered here). We use 
Hermite functions ^p = {4'j)j>i as an orthonormal basis of K, such that for all 
X S K, f{x) — feix) — J^p^i^j'^ji^)- This basis is uniformly bounded (by 
Cramer's inequality). Consider the sieve prior 11 in its truncated version (U) for 
6, with radius ri a (possibly large) constant independent of k and n. 



We show that Conditions A1-A4, are satisfied, along the lines of Ghosal and van 



der Vaart (20071 Sections 4 and 7.4. Denote pg{y\x) = 'p{y — fe{x)) the transi- 
tion density of the chain, where iy9( • ) is the standard normal density distribution, 
and where reference measures relative to x and y are denoted respectively by 
1^ and /X. Define r{y) = h{<f{y — Mi) + ip{y + Mi)), and set = rdfi. Then 



Ghosal and van der Vaart (20071 show that the chain (Xi)i<i<„ has a unique 
stationary distribution and prove the existence of tests satisfying A4 relative to 
the Hellinger semidistance d whose square is given by 

d\9,e')^ J I [VP0iy\x)~ VPe'iyl^))' dfi{y)d,,{x). 

They show that d is bounded by || • II2 (which proves Condition A3) and that 

K{po,Pe) = V2,o(Po,Pe) ^ ll^o - ^Wl- 
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Thus Equation ( 13 ) holds. Condition A2 follows from inequalities K{poj^ , pg) 



< 



E'ili l^oj - 0,\ and V2,o{Poj„,Pe) < ||^o,„ - 0\\l. for 6 £ 6, 



3 Application to the white noise model 



Consider the Gaussian white noise model 



dW{t), 0<t<l, 



(16) 



in which we observe processes X^{t), where /g is the unknown function of 
interest belonging to L^(0, 1), W{t) is a standard Brownian motion, and n is 



the sample size. We assume that /q lies in a Sobolev ball, 0^(Lo), see (111 



Brown and Low (19961 show that this model is asymptotically equivalent to 



the nonparametric regression (assuming /3 > 1/2). It can be written as the 
equivalent infinite normal mean model using the decomposition in a Fourier 
basis ip — {tpj)j>i of i^(0, 1), 



(17) 



where X^' = ijjj{t) dX'\t) are the observations, 0oj — /g i^jit) fQ{t)dt the 
Fourier coefficients of /q, and S^j = 'tpj{t)dW{t) are independent standard 
Gaussian random variables. The function /q and the parameter 9q are linked 
through the relation in L^{0, 1), fg — X]j°=i ^oji'j- 

In addition to results in concentration, we are interested in comparing the risk 
of an estimate corresponding to basis coefficients On, under two different 
losses: the global loss (if expressed on curves /, or P loss if expressed on 0), 



fo 



i?f(0o)=<"^ 
and the local loss at point t e [0, 1], 



E 



00 



3 = 1 



E, 



(") 



00 



0=1 



with Oj = Vj {t) ■ Note that the difference between global and local risks expres- 
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sions in basis coefficients comes from the parenthesis position with respect to 
the square: respectively the sum of squares and the square of a sum. 

We show that sieve priors allow to construct adaptive estimate in global risk. 
However, the same estimate does not perform as well under the pointwise loss, 



which illustrates the result of Cai et al. (20071. We provide a lower bound for 
the pointwise rate. 



3.1 Adaptation under global loss 



Consider the global loss on Oq. The likelihood ratio is given by 



Po 



Pe 



where (., .) denotes the P scalar product. We choose here the P distance as 
dn{9,0') — — S'll^. Let us check that Conditions - A4 and Condition 



(13) hold. 



The choice of dn ensures Condition As with Dq — D2 = 1 and Di — 0. The test 
statistic of do against 9i associated with the likelihood ratio is (pnifii) = 1(2 — 
00,^") > ||6'i||2-||0o|l2)- With Lemma 5 oflChosal and van der Vaart|(|2007| we 



have that E|j"^(</>„(6»i)) < e-»l|ei-eo||^/4 ^nd E^"^ (1 - (/)„(0i)) < e-"ll«i-»°ll2/4 
for 9 such that \\9 — diW^ < \\9i — 0OII2 /4- It provides a test as in Condition 
A4 with ci = C = 1/4. 



Moreover, following Lemma 6 of Ghosal and van der Vaart (20071 we have 

K = n \\9 - 94I /2 and ^2.0 (p^"\p^"^) = ^ 11^ - 9o\\l . 

For m> 2, we have 



Vr. 



( (") 
1,0 [Po ',P0') 



(«) 
Po 



= n"^ lpi"'>\{9o-9,X^^-9o)rdpi 
< n"^\\9o-9\\'^ Jpl-^\\X--9o\\^d^,. 



djjL 



The centered m — moment of the Gaussian variable X" is proportional to 



and Condit: 



ition (jlij) is 



satisfied. 
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The same calculation shows that Condition A'^ is satisfied: for all B S 8j„, 



K 



and 



Conditions A2 - A4 and Condition (13 1 hold, if moreover A4 is satisfied, then 



by Proposition^ the procedure is adaptive, which is expressed in the following 
proposition. 



Proposition 2. Under the prior 11 defined in Equations (I4), the global I rate 



of posterior contraction is optimal adaptive for the Gaussian white noise model, 
i.e. for M large enough and /32 > /?! > 1/2 



sup sup 

/3i</9<& 6loee^(Lo) 



E[,"^n [9: we- 



0, 



with £„(/?) = eo('^) 



The distance here is not bounded, so Corollary [T] does not hold. For deriving a 
risk rate, we need a more subtle result than Theorem [T] that we can obtain when 
considering sets 5„,,(Af) = {o : M^(j + l)el > \\9 - Oof, > M^je^ }, 

j = 1, 2, . . . instead of 5„(Af) = |6» : ||6» - 6»o||2 > M^-^elj. Then the bound 
of the expected posterior mass of iS„j (A/) becomes 



E("^n(5„,,(M)|X")<C7 (njei) 



-m/2 



(18) 



for a fixed constant cy. Hence we obtain the following rate of convergence in 
risk. 



Proposition 3. Under Condition (13) with m > 5, the expected posterior risk 



given Bq and 11 converges at least at the same rate e„ 

7^f (0o)=4"'n[||0-0o|l2l^"l =o{el) 



for any 6q. So the procedure is risk adaptive as well (up to a log(n) term). 
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Proof. We have 



{e i Sn{M)) + Y,i{ee s^AM)) \\e - ©oil' 1^" 



Due to (18 1, the last sum in j converges as soon as m > 5. This is possible 



in the white noise setting because the conditions are satisfied whatever m. So 



We have shown that conditional to the existence of a sieve prior for the white 



noise model satisfying (cf. Section 2.31, the procedure has minimax rates (up 



to a log(n) term) both in contraction and in risk. We now study the asymptotic 
behaviour of the posterior under the local loss function. 



3.2 Lower bound under pointwise loss 

The previous section derives rates of convergence under the global loss. Here, 
under the pointwise loss, we show that the risk deteriorates as a power n factor 
compared to the benchmark minimax pointwise risk (note the differ- 

ence with the global minimax rate n~^^/(^'^+^\ both given for risks on squares). 



We use the sieve prior defined as a conditional Gaussian prior in Equation ( 14 1 . 
Denote by ^„ the Bayes estimate of 6 (the posterior mean). Then the following 
proposition gives a lower bound on the risk (pointwise square error) under a 
pointwise loss: 

Proposition 4. If the point t is such that Qj = ipj{t) = 1 for all j (t = 0), then 
for all f3 > q, for all Lq > 0, a lower bound on the risk rate under pointwise loss 
is given by 

sup Rl"" {eo,t)>n-^A log^ n. 
Proof. See the Appendix. □ 



Cai et al. (20071 show that a global optimal estimator cannot be pointwise 
optimal. The sieve prior leads to an (almost up to a log n term) optimal global 
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Fig. 2: Variation of the exponent of the penalty in a log scale for (3 between 1/2 
and 100; it is niaxiniiini for /3 = (1 + V2)/2 



risk and Proposition ^ shows that the pointwise risk associated to the posterior 
mean 0„ is suboptimal with a power of n penalty, whose exponent is 



2P-1 2/3-1 



2/3 - 1 



2/3 



2/3+1 2p{2l3 + iy 



The maximal penalty is for [5 — (1 + a/2)/2, and it vanishes as (i tends to 1/2 



and +00 (see the Figure^. Abramovich et al. (2007a I also derive such a power 



n penalty on the maximum local risk of a globally optimal Bayesian estimate, as 
well as on the reverse case (maximum global risk of a locally optimal Bayesian 
estimate) . 

Remark 5. This result is not anecdotal and illustrates the fact that the Bayesian 
approach is well suited for loss functions that are related to the KuUback-Leibler 
divergence (i.e. often the P loss). The pointwise loss does not satisfy this since 
it corresponds to an unsmooth linear functional of 0. This possible suboptimal- 
ity of the posterior distribution of some unsmooth functional of the parameter 
has already been noticed in various other cases, see for instance [Rivoirard and| 



Rousseau (2012b I or Rousseau and Kruijer (20111. The question of the exis 



tence of a fully Bayesian adaptive procedure to estimate /o(i) = Y^^=i^j^03 
remains an open question. 
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A Appendix 

A.l Three technical lemmas 

Set Sn{M) = {e : 4(0, 0o) > M'-^el} and recall that GfcJQ) = {d € 9^^ : 
||0||2,fe„ < n'^}, Q > 0. We begin with three technical lemmas. 

Lemma 1. // Conditions As and A4 hold, then there exists a test 0„ such that 
for M large enough, there exists a constant C2 such that 

4"^(</>J<e-^=^^"^" and E^") (1 - 0J < e-'^^^^ft?-". 



for alio &Sn{M)f\Qkn{Q)- 
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Proof. Set r„ = {^J ^'^1^}^^^^''' ■ set S„{M) n O^jg) is compact 
relative to the P norm. Let a covering of this set by P balls of radius r„ and 
centre 0''^\ Its number of elements is rjn < {Crfi /rnY'^ < exp(C/c„ logn) < 
exp(C^^ne^) due to relation (jsj). 

For each centre S*-*-' € 5„(A/) n 8fc,^((5); there exists a test (/)„(0*^'-') satisfying 
Condition A4. We define the test = maxj <j)n{6^^^) which satisfies 

^0 (<^nj — '7nC * ' " < e ^<"' " " < e ^<"' 

for Af large enough and a constant C2. 

Here, Condition A3 allows to switch from the coverage in term of the P distance 
to a covering expressed in term of d„: each 9 g iS„(M) n Qk„{Q) which lies in 
a P ball of centre and of radius r„ in the covering of size r/n also lies in a 
dn ball of adequate radius 



Then there exists a constant C2 (the minimum with the previous one) 

sup E("'(l-0J<e--*^^"^^ 
ees„{M)n0k„{Q) 

hence the result follows. □ 

Lemma 2. Under Condition A^, for any constant cg > 0, there exist positive 
constants Q, C and Mq such that 

n(e^„(Q)) <Ce-^^"^", (19) 



where Mq is introduced in the definition |£p o/fc„, and Q'j, (Q), the complemen- 
tary of Qk„{Q), is taken in 8. 

Proof Ql^iQ) is written by Ql^{Q) = {6 e Q : |10|l2,/c„ > or 3j > 
kn s.t. 6j 7^ 0}, so its prior mass is less than 7r(fc > fc„) + X]fc<fe '^k^k{S G 0/c : 
ll^lb.fc > "rfi), where the last sum is less than 11^^(0 € : ||0||2,/c„ > rfi) 
because its terms are increasing. 
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The prior mass of sieves that exceed fc„ is controlled by Equation ([6| . We have 

j>k„ j>kn 

Since L is a slow varying function, we have A;„L(fc„) > jn log(n) > ne^. Hence 
TT {k > kn) < Ce"'^"'"^" for a constant ce as large as needed since it is determined 
by constant Mq in Equation (|5]). 

Then by the second part of Condition (j7|, Ilk,^ {d G 9fc„ : 11^112 k ^'^) -^^^^ 
than 



/ n 9iOj/r,)/T,d9,, 

< (Gan^^)^" / eM~Giy]\0jr/T^)de,, (20) 



by using the lower bound on the r, 's of Equation (|9| . 
If a > 2, then applying Holder inequality, one obtains 

which leads to 

mik^ > kir^'/'nQ-. 

If a < 2, then a classical result states that the P norm || . is larger than the 
P norm || . ||2, i.e. 

Eventually the upper bound tq on the tj 's of Equation (Isl) provides 



El^.r/r;>ro-"nQ"min(fcr"/2,l) 



The integral in the right-hand side of ( 20 1 is bounded by 



k 
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The last integral is bounded by C'^" , so 

2 

The right-hand side of the last inequality can be made smaller than Ce"'^'*"'^" 
for any constant C and cg provided that Q is chosen large enough. This entails 



result ( 19 1 



In the truncated case we note that if l^jl — '^i; then X]^=i ^| — 

so that for n large enough, 11(0^ (Q)) — 7r(fc > /c„), and the rest of the proof 
is similar. □ 

Lemma 3. Under Conditions A\, A2 and A5, there exists C4 > such that 

n(6„(m)) > e-^^"^". 
Proof. Let 9 E An{Hi). For n large enough, Conditions Ai and A2 imply that 
K{p^^\p^'^) < K{pi"\pl^l) + Kip^S^J-^) < 2nel 

and 

m 



which yields An{Hi) C i3„(m) so that a lower bound for n(S„(m)) is given by 
Il{An{Hi)). Note that for Hq > Hi, then 

AniHo)cAn{Hi)cB^im). (21) 

We have 

CXD 

U{An{Hi)) =J2Ak)Uk{An{Hi)) > 7r(j„)n,,.(A(ifi)). 

fe=l 

By the first part of Condition ^ we have 

^(Jn)>e-^"^'^"^ >e~^"^", (22) 
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for C4 large enough. Now by the first part of Condition Q and by Condition 



lie-eo 



n,„(A.(i?i)) = / 

-'IK 



(23) 



e-e, 



exp(-G2^|f?,|7T;)d0, 



We can bound above " by n"^^ by Equation M as j < j„ < fcn- We write 
|6lj|" < 2" (|6loj|" + |6'j - 6'oj r). First, Equation (|lO]) gives 

^lsr/T;<cj„iogn. 

Then, if a > 2 

and if a < 2 then Holder's inequality provides 
In both cases we have 



E I^J-r < 2"(Cj„ logn + 

so choosing H2 < Hi ensures to bound the latter by j„ logn. Last, the integral 
of the ball in dimension jn, centered around 0oi„ , and of radius n~^^ , is at least 
equal to e^'-^^"'°s", for some given positive constant C. 

Noting that j„ = [jone^/ \og{n)\ and choosing Hi large enough, which is possi- 
ble by Equation (21 1, ensures the existence of C4 > such that Ilj^{An{Hi)) > 



e 2 



Combining this with (22) allows to conclude. 



In the truncated case (j3|, we can first choose ri larger than 2^^^^^ l^ojl- If 
e e A,{Hi), then S^'l, \9j\ < E -^id^j - ^o,l + 1^1) < ^^n-^^ +ri/2 < n 
for n and Hi large enough. So the expression of integral (23) is still valid. □ 
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A. 2 Theorem [T] 

Proof, (of Theorem [ij 

Express the quantity of interest 11 (iS„(Af)|X") in terms of Nn, Nn and Dn 
defined as follows 

Denote Pn{cz) — exp( — (ca + l)ne^)n(,B„(TO)) for C3 > 0. Introduce ^„ the test 
statistic of Lemma [l] and take the expectation of the posterior mass of 5„(-M) 
as follows 

< (0„) + (^ ^''j^^f " (1 - 0„) < Pnics)) + > Paics)))^ 

< <^ (0J + F^"^ (A. < P„(C3)) + (24) 



Lemma 10 in 

-m/2 



Ghosal and van der Vaart 



(20071 gives P[,"^ (^„ < p„(c3)) < 



(ne^) ™ for every C3 > 0. 

Fubini's theorem entails that E[,"^(iV„(l-0„)) < sup^^^^^^^Q^^ 

Along with Eq"''((/)„), it is upper bounded in Lemmajljby e"'^^*^^'"""'^". 

Lemma [2] implies that E[,"^(iVj,) < n(e^ (Q)) < e^'^"'"' and Lemma [s] yields 

n„(B„(m)) > e""^*"^". Constants C3 and C4 are fixed, so we can choose M, Mq 

and Q large enough for cg to be sufficiently large (see proof of Lemma [2| , such 

that min(M^^C2, Cg) > C3 +C4 + 1. It implies that the third term in Equation 

I 2 
(24) is bounded above by e"'^^'"" for some positive C5. Finally, 

E^")n(5„(Af)|x") = ofK)-'"/^) 0, 



— ^ c». □ 

n— J-oo 
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A. 3 Proposition |4] 

The proof of the lower bound in the local risk case uses the next lemma, whose 
proof follows from Cauchy-Schwarz' inequality. 

Lemma 4. IfE{Bl) = o{E{Al)), then E{{A„ + B^f) = E{Al){l + o(l)). 
Proof, (of Proposition |4]) 

The coordinates of 0„ are 0,^ = H {ej\X^') = 7r(fc|X")^„j(A:), with 0„j(fc) = 



Zhao 



20001. 



T|/(r| + ^/n)X^-- if k > j, and 9„-j{k) = otherwise (see 

Denote Mj(X") = J2k>j 7r(A:|X") = n{k > so that §nj = Uj(X")T|/(r2 + 

'^/n)X^. Denote Kn = and J„ = n}/"^^ . Most of the posterior mass 

on k is concentrated before _ftr„. in the sense that there exists a constant c such 
that 

e|,") (uk„ (X")) < exp i-cK^) . (25) 
This follows from the exponential inequality 

P^;;)[uk„(X") > exp(-ci^„)] < exp(-cif„), 

which is obtained by classic arguments in line with Theorem [T| writing the 
posterior quantity uk^ i^^) ^ ^ ratio Nn/Dn, and then using Fubini's theorem, 
Chebyshev's inequality and an upper bound on 7r(fc > Kn). 



Due to Relation (17 1, we split in three the sum in the risk 

2 

by centring the stochastic term X" and writing 1 — Ui{X^) ^■ij^xj,^ — nV^+y^ ~^ 
r?+i/ (-^ ~ Wi(X")). The idea of the proof is to show that there is a leading 
term in the sum. and to apply Lemma |4] 

Let i?i = (j2Zia^^,Oo?)\ i?2 = e[,"^(E^i«.7?Sa(1-^.(^"))%.)' 
and i?3 = Eq"' ^X^i^i • -l^y using Cauchy-Schwarz' inequal- 
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2 / oo \ 2 



ity 

(OO 

oo 
i=l 

because the a^'s are bounded. If 2/3 — > 1, then we can write 

oo 
i=l 

and if 2/3 — 4g < 1, then comparing to an integral provides 



< 



_ 2/3-1 _ 2/3-1 

< n 2, <n , 



where the last inequality holds because q is chosen such that q < p. Then 

For A; = 2,3, denote Rk{bn,Cn) the partial sum of Rk from j = bn to c„. Then 
^^2(1, Jn) is the larger term in the decomposition, and is treated at the end of 
the section. The upper part R2{Jn, 00) is easily bounded by 

We split i?3(l, Jn) in two parts i?3,i(l, J„) and i?3,2(l, Jn) by writing Uj(X") = 
+ 7r(z < A: < J„|X") for all i < J„: 



j=i i=i 



nRs{l,Jn) < E("Mg7rO-|X")^a,^^, 



.7,, \ 2 



i=l 



:= -^3,1(1, -^n) + -^3,2(1, ^n)- 
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Let rj„(X") = We have X^^^i 7^01^") < 1 so we can apply 

Jensen's inequality, 

i?3,i(l,J„) < e[,") ^g7r(j|X")r,„(X") 
< E(")max{r,„(X")2}. 

Noting that (rjn(X"))j^^^.^j is a martingale, we get using Doob's inequality 

J„ 2 

i=l 

The second term i?3,2(li ^^n) can be upper bounded in the same way as for 



i?3(J„,oo) in Equation (26 I below by noting that 



For the upper part i?3(J„,oo), we use the bound (25 1 on Eq"'' 



=-R',i 



< e[,"^ 



\i=K„ 



(26) 



< 



< 



1/2 



1/2 



1/2 



E ?7+v 



1/2 



where we bound the different moments of by a unique constant and then 
use TZk„ ^fli^f + V") = 0(^1/29). Then i?3 = 0(n-(2/J-i)/2/3)^ 

To sum up, R2{1, Jn) is the only remaining term. We build an example where it 
is of greater order than n^'^^^^^y^l^ . Let Oq be defined by its coordinates 9oi = 
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J-/3-1/2 (log(i + 1))""'^ such that the series ^'oi*^'' converge, so Oq belongs to 
the Sobolev ball of smoothness /3. It is assumed that = ipi{t) = 1, so all terms 
in the sum -R2(l, Jn) are positive, hence 

i?2(l, Jn) > f E (1 - MXn)Oo^ 

\i=K„ 

noting that for i < Jn, we have nrf > n^^i/f^ > 1 because q < [3 and n > 1, so 
'^i /{'^i + V") ^ 1/2- Moreover, decreases with i, so 

i?2(l, J„) > Je^"^ ((1 -7.K„(X"))2) f ^ 0O» 

where Eq""* ((1 — w/f^ (X"))^) is lower bounded by a positive constant for n 
large enough. Comparing the series ^"^Zk ^oi to an integral shows that it is 
bounded from below by if„ '^^^^^/ log n. We obtain by using Lemma [4] that 
E}°''{0o,t) ^ i?2(l, J„)(l + 0(1)) > n^^/log^n, which ends the proof. □ 





